5 research outputs found

    Robot Fast Adaptation to Changes in Human Engagement During Simulated Dynamic Social Interaction With Active Exploration in Parameterized Reinforcement Learning

    Get PDF
    International audienceDynamic uncontrolled human-robot interactions (HRIs) require robots to be able to adapt to changes in the human's behavior and intentions. Among relevant signals, non-verbal cues such as the human's gaze can provide the robot with important information about the human's current engagement in the task, and whether the robot should continue its current behavior or not. However, robot reinforcement learning (RL) abilities to adapt to these nonverbal cues are still underdeveloped. Here, we propose an active exploration algorithm for RL during HRI where the reward function is the weighted sum of the human's current engagement and variations of this engagement. We use a parameterized action space where a meta-learning algorithm is applied to simultaneously tune the exploration in discrete action space (e.g., moving an object) and in the space of continuous characteristics of movement (e.g., velocity, direction, strength, and expressivity). We first show that this algorithm reaches state-of-the-art performance in the nonstationary multiarmed bandit paradigm. We then apply it to a simulated HRI task, and show that it outper-forms continuous parameterized RL with either passive or active exploration based on different existing methods. We finally test the performance in a more realistic test of the same HRI task, where a practical approach is followed to estimate human engagement through visual cues of the head pose. The algorithm can detect and adapt to perturbations in human engagement with different durations. Altogether, these results suggest a novel efficient and robust framework for robot learning during dynamic HRI scenarios

    Adaptive reinforcement learning with active state-specific exploration for engagement maximization during simulated child-robot interaction

    Get PDF
    International audienceUsing assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact. Among relevant signals, non-verbal cues such as the child's gaze can provide the robot with important information about the child's current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parame-terized actions enable the agent to learn single moves until the end of a corridor, similarly to "options" but without explicit hierarchical representations. We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children

    Online adaptation to human engagement perturbations in simulated human-robot interaction using hybrid reinforcement learning

    No full text
    International audienceDynamic uncontrolled human-robot interaction requires robots to be able to adapt to changes in the human's behavior and intentions. Among relevant signals, non-verbal cues such as the human's gaze can provide the robot with important information about the human's current engagement in the task, and whether the robot should continue its current behavior or not. In a previous work [1] we proposed an active exploration algorithm for reinforcement learning where the reward function is the weighted sum of the human's current engagement and variations of this engagement (so that a low but increasing engagement is rewarding). We used a structured (parameterized) continuous action space where a meta-learning algorithm is applied to simultaneously tune the exploration in discrete and continuous action space, enabling the robot to learn which discrete action is expected by the human (e.g. moving an object) and with which velocity of movement. In this paper we want to show the performance of the algorithm to a simulated humanrobot interaction task where a practical approach is followed to estimate human engagement through visual cues of the head pose. We then measure the adaptation of the algorithm to engagement perturbations simulated as changes in the optimal action parameter and we quantify its performance for variations in perturbation duration and measurement noise

    Online adaptation to human engagement perturbations in simulated human-robot interaction using hybrid reinforcement learning

    No full text
    International audienceDynamic uncontrolled human-robot interaction requires robots to be able to adapt to changes in the human's behavior and intentions. Among relevant signals, non-verbal cues such as the human's gaze can provide the robot with important information about the human's current engagement in the task, and whether the robot should continue its current behavior or not. In a previous work [1] we proposed an active exploration algorithm for reinforcement learning where the reward function is the weighted sum of the human's current engagement and variations of this engagement (so that a low but increasing engagement is rewarding). We used a structured (parameterized) continuous action space where a meta-learning algorithm is applied to simultaneously tune the exploration in discrete and continuous action space, enabling the robot to learn which discrete action is expected by the human (e.g. moving an object) and with which velocity of movement. In this paper we want to show the performance of the algorithm to a simulated humanrobot interaction task where a practical approach is followed to estimate human engagement through visual cues of the head pose. We then measure the adaptation of the algorithm to engagement perturbations simulated as changes in the optimal action parameter and we quantify its performance for variations in perturbation duration and measurement noise

    Adaptive reinforcement learning with active state-specific exploration for engagement maximization during simulated child-robot interaction

    No full text
    Using assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact.Among relevant signals, non-verbal cues such as the child’s gaze can provide the robot with important information about the child’s current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parameterized actions enable the agent to learn single moves until the end of a corridor, similarly to “options” but without explicit hierarchical representations.We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children
    corecore